Feature vector normalization with combined standard and throat microphones for robust ASR
نویسندگان
چکیده
We propose on-line unsupervised compensation technique for robust speech recognition that combines standard and throat microphone feature vectors. The solution, called MultiEnvironment Model-based LInear Normalization with Throat microphone information, MEMLINT, is an extension of MEMLIN formulation. Hence, standard microphone noisy space and throat microphone space are modelled as GMMs and a set of linear transformations are learnt from data associated to each pair of Gaussians (one for each GMM) using training stereo data. On the other hand, to compensate some kinds of degradation which are not considered in MEMLINT, we propose to use jointly an on-line unsupervised acoustic model adaptation method based on rotation transformations over an expanded HMM-state space (augMented stAte space acousTic dEcoder, MATE). Some experiments with an own recorded database were carried out, showing that the proposed approach significantly outperforms the single microphone approach.
منابع مشابه
Combination of standard and throat microphones for robust speech recognition in highly noisy environments
We present a method to combine standard and throat microphone signals for noise-robust speech recognition. Our approach is to extend the probabilistic optimum filter (POF) mapping algorithm to estimate standard microphone clean speech feature vectors from both microphones’ noisy speech feature vectors. We tested the proposed approach in two noisy speech recognition tasks. In the first task we u...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملRobust Speaker Recognition with Combined Use of Acoustic and Throat Microphone Speech
Accuracy of automatic speaker recognition (ASV) systems degrades severely in the presence of background noise. In this paper, we study the use of additional side information provided by a body-conducted sensor, throat microphone. Throat microphone signal is much less affected by background noise in comparison to acoustic microphone signal. This makes throat microphones potentially useful for fe...
متن کاملCombined use of close-talk and throat microphones for improved speech recognition under non-stationary background noise
This paper intends to summarize recent developments and experimental results related to Automatic Speech Recognition (ASR) using signals captured with a throat-microphone. Due to the proximity of the sensor to the voice source, the signal is naturally less subject to background noise. This however yields speech sounds that have different frequency contents than with traditional microphones, and...
متن کاملAutomatic Speech Recognition on Vibrocervigraphic and Electromyographic Signals
Automatic speech recognition (ASR) is a computerized speech-to-text process, in which speech is usually recorded with acoustical microphones by capturing air pressure changes. This kind of air-transmitted speech signal is prone to two kinds of problems related to noise robustness and applicability. The former means the mixing of speech signal and ambient noise usually deteriorates ASR performan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008